Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

نویسندگان

  • Georgi Georgiev
  • Preslav Nakov
  • Kuzman Ganchev
  • Petya Osenova
  • Kiril Ivanov Simov
چکیده

The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models

This paper mainly describes a Chinese named entity recognition (NER) system NER@ISCAS, which integrates text, partof-speech and a small-vocabularycharacter-lists feature and heristic postprocess rules for MSRA NER open track under the framework of Conditional Random Fields (CRFs) model.

متن کامل

Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets

As the wealth of biomedical knowledge in the form of literature increases, there is a rising need for effective natural language processing tools to assist in organizing, curating, and retrieving this information. To that end, named entity recognition (the task of identifying words and phrases in free text that belong to certain classes of interest) is an important first step for many of these ...

متن کامل

Biomedical and Chemical Named Entity Recognition with Conditional Random Fields: The Advantage of Dictionary Features

We present our work on Chemical and Biomedical Named Entity Recognition (NER) using Machine Learning algorithms with different feature sets. It will be demonstrated, that the best results could be obtained using Conditional Random Fields. Furthermore we show the advantage of dictionary based features in this context. All results are obtained with the benchmark settings of the Joint Workshop on ...

متن کامل

Feature Subset Selection in Conditional Random Fields for Named Entity Recognition

In the application of Conditional Random Fields (CRF), a huge number of features is typically taken into account. These models can deal with interdependent and correlated data with an enormous complexity. The application of feature subset selection is important to improve performance, speed and explainability. We present and compare filtering methods using information gain or χ 2 as well as an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009